Multilingual Transliteration Using Feature based Phonetic Method
نویسندگان
چکیده
In this paper we investigate named entity transliteration based on a phonetic scoring method. The phonetic method is computed using phonetic features and carefully designed pseudo features. The proposed method is tested with four languages – Arabic, Chinese, Hindi and Korean – and one source language – English, using comparable corpora. The proposed method is developed from the phonetic method originally proposed in Tao et al. (2006). In contrast to the phonetic method in Tao et al. (2006) constructed on the basis of pure linguistic knowledge, the method in this study is trained using the Winnow machine learning algorithm. There is salient improvement in Hindi and Arabic compared to the previous study. Moreover, we demonstrate that the method can also achieve comparable results, when it is trained on language data different from the target language. The method can be applied both with minimal data, and without target language data for various languages.
منابع مشابه
H Indi and M Arathi to E Nglish M Achine T Ransliteration Using Svm
Language transliteration is one of the important areas in NLP. Transliteration is very useful for converting the named entities (NEs) written in one script to another script in NLP applications like Cross Lingual Information Retrieval (CLIR), Multilingual Voice Chat Applications and Real Time Machine Translation (MT). The most important requirement of Transliteration system is to preserve the p...
متن کاملGraphemes Sharing Phonetic Features Tend to Induce Similar Synesthetic Colors
Individuals with grapheme-color synesthesia experience idiosyncratic colors when viewing achromatic letters or digits. Despite large individual differences in grapheme-color association, synesthetes tend to associate graphemes sharing a perceptual feature with similar synesthetic colors. Sound has been suggested as one such feature. In the present study, we investigated whether graphemes of whi...
متن کاملPunjabi Machine Transliteration
Machine Transliteration is to transcribe a word written in a script with approximate phonetic equivalence in another language. It is useful for machine translation, cross-lingual information retrieval, multilingual text and speech processing. Punjabi Machine Transliteration (PMT) is a special case of machine transliteration and is a process of converting a word from Shahmukhi (based on Arabic s...
متن کاملUnsupervised Named Entity Transliteration Using Temporal and Phonetic Correlation
In this paper we investigate unsupervised name transliteration using comparable corpora, corpora where texts in the two languages deal in some of the same topics — and therefore share references to named entities — but are not translations of each other. We present two distinct methods for transliteration, one approach using an unsupervised phonetic transliteration method, and the other using t...
متن کاملCombining probability models and web mining models: a framework for proper name transliteration
The rapid growth of the Internet has created a tremendous number of multilingual resources. However, language boundaries prevent information sharing and discovery across countries. Proper names play an important role in search queries and knowledge discovery. When foreign names are involved, proper names are often translated phonetically which is referred to as transliteration. In this research...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2007